Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
نویسندگان
چکیده
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of where the reward and transition kernel possess linear structure. Both offline online settings problems are considered. In setting, control both players aim to find Nash equilibrium by minimizing duality gap. single player playing against an arbitrary opponent minimize regret. For settings, propose optimistic variant least-squares minimax value iteration algorithm. show that our algorithm is computationally achieves [Formula: see text] upper bound on gap regret, d dimension, H horizon T total number timesteps. Our results do not require additional assumptions sampling model. setting requires overcoming several new challenges absent in decision processes or turn-based games. particular, achieve optimism moves, construct lower confidence bounds function, then compute policy solving general-sum matrix game these as payoff matrices. As finding hard, instead solves coarse correlated (CCE), which can be obtained efficiently. best knowledge, such CCE-based scheme has appeared literature might interest its own right. Funding: Q. Xie partially supported National Science Foundation [Grant CNS-1955997] J.P. Morgan. Y. Chen [Grants CCF-1657420, CCF-1704828, CCF-2047910]. Z. Wang acknowledges 2048075, 2008827, 2015568, 1934931], Simons Institute (Theory Reinforcement Learning), Amazon, Morgan, Two Sigma their support.
منابع مشابه
Value Function Approximation in Zero-Sum Markov Games
This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping probl...
متن کاملSampling Techniques for Markov Games Approximation Results on Sampling Techniques for Zero-sum, Discounted Markov Games
We extend the “policy rollout” sampling technique for Markov decision processes to Markov games, and provide an approximation result guaranteeing that the resulting sampling-based policy is closer to the Nash equilibrium than the underlying base policy. This improvement is achieved with an amount of sampling that is independent of the state-space size. We base our approximation result on a more...
متن کاملFlow Control Using the Theory of Zero Sum Markov Games
We consider the problem of dynamic ow control of arriving packets into an innnite buuer. The service rate may depend on the state of the system, may change in time and is unknown to the controller. The goal of the controller is to design an eecient policy which guarantees the best performance under the worst service conditions. The cost is composed of a holding cost, a cost for rejecting custom...
متن کاملLearning in Zero-Sum Team Markov Games Using Factored Value Functions
We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned policies can be executed in a distributed manner. The value function is represented as a factored linear architecture and its str...
متن کاملSampling Techniques for Zero-sum, Discounted Markov Games
In this paper, we first present a key approximation result for zero-sum, discounted Markov games, providing bounds on the state-wise loss and the loss in the sup norm resulting from using approximate Q-functions. Then we extend the policy rollout technique for MDPs to Markov games. Using our key approximation result, we prove that, under certain conditions, the rollout technique gives rise to a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2023
ISSN: ['0364-765X', '1526-5471']
DOI: https://doi.org/10.1287/moor.2022.1268